Accurate Floating - Point Summation ∗
نویسندگان
چکیده
Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s into the set of floating-point numbers, i.e. one of the immediate floating-point neighbors of s. If the s is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e. it is very fast for mildly conditioned sums with slowly increasing computing time proportional to the condition number. All statements are also true in the presence of underflow. Furthermore algorithms with K-fold accuracy are derived, where in that case the result is stored in a vector of K floating-point numbers. We also present an algorithm for rounding the sum s to the nearest floating-point number. Our algorithms are fast in terms of measured computing time because they neither require special operations such as access to mantissa or exponent, they contain no branch in the inner loop, nor do they require extra precision: The only operations used are standard floating-point addition, subtraction and multiplication in one working precision, for example double precision. Moreover, in contrast to other approaches, the algorithms are ideally suited for parallelization. We also sketch dot product algorithms with similar properties.
منابع مشابه
Accurate floating-point summation: a new approach
The aim of this paper is to find an accurate and efficient algorithm for evaluating the summation of large sets of floating-point numbers. We present a new representation of the floating-point number system in which a number is represented as a linear combination of integers and the coefficients are powers of the base of the floating-point system. The approach allows to build up an accurate flo...
متن کاملGroup-Alignment based Accurate Floating-Point Summation on FPGAs
Floating-point summation is one of the most important operations in scientific/numerical computing applications and also a basic subroutine (SUM) in BLAS (Basic Linear Algebra Subprograms) library. However, standard floating-point arithmetic based summation algorithms may not always result in accurate solutions because of possible catastrophic cancellations. To make the situation worse, the seq...
متن کاملError-free transformations in real and complex floating point arithmetic
Error-free transformation is a concept that makes it possible to compute accurate results within a floating point arithmetic. Up to now, it has only be studied for real floating point arithmetic. In this short note, we recall the known error-free transformations for real arithmetic and we propose some new error-free transformations for complex floating point arithmetic. This will make it possib...
متن کاملAccurate Sum and Dot Product
Algorithms for summation and dot product of floating point numbers are presented which are fast in terms of measured computing time. We show that the computed results are as accurate as if computed in twice or K-fold working precision, K ≥ 3. For twice the working precision our algorithms for summation and dot product are some 40 % faster than the corresponding XBLAS routines while sharing simi...
متن کاملAccurate summation, dot product and polynomial evaluation in complex floating point arithmetic
Article history: Available online 30 March 2012
متن کاملTwofold fast summation
Debugging accumulation of floating-point errors is hard; ideally, computer should track it automatically. Here we consider twofold approximation of exact real with value + error pair of floating-point numbers. Normally, value + error sum is more accurate than value alone, so error can estimate deviation between value and its exact target. Fast summation algorithm, that provides twofold sum of ∑...
متن کامل